home *** CD-ROM | disk | FTP | other *** search
Text File | 1991-04-20 | 52.3 KB | 1,418 lines |
-
-
-
-
- Network Working Group R. Braden
- Request for Comments: 1071 ISI
- D. Borman
- Cray Research
- C. Partridge
- BBN Laboratories
- September 1988
-
-
- Computing the Internet Checksum
-
-
- Status of This Memo
-
- This memo summarizes techniques and algorithms for efficiently
- computing the Internet checksum. It is not a standard, but a set of
- useful implementation techniques. Distribution of this memo is
- unlimited.
-
- 1. Introduction
-
- This memo discusses methods for efficiently computing the Internet
- checksum that is used by the standard Internet protocols IP, UDP, and
- TCP.
-
- An efficient checksum implementation is critical to good performance.
- As advances in implementation techniques streamline the rest of the
- protocol processing, the checksum computation becomes one of the
- limiting factors on TCP performance, for example. It is usually
- appropriate to carefully hand-craft the checksum routine, exploiting
- every machine-dependent trick possible; a fraction of a microsecond
- per TCP data byte can add up to a significant CPU time savings
- overall.
-
- In outline, the Internet checksum algorithm is very simple:
-
- (1) Adjacent octets to be checksummed are paired to form 16-bit
- integers, and the 1's complement sum of these 16-bit integers is
- formed.
-
- (2) To generate a checksum, the checksum field itself is cleared,
- the 16-bit 1's complement sum is computed over the octets
- concerned, and the 1's complement of this sum is placed in the
- checksum field.
-
- (3) To check a checksum, the 1's complement sum is computed over the
- same set of octets, including the checksum field. If the result
- is all 1 bits (-0 in 1's complement arithmetic), the check
- succeeds.
-
- Suppose a checksum is to be computed over the sequence of octets
-
-
-
- Braden, Borman, & Partridge [Page 1]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- A, B, C, D, ... , Y, Z. Using the notation [a,b] for the 16-bit
- integer a*256+b, where a and b are bytes, then the 16-bit 1's
- complement sum of these bytes is given by one of the following:
-
- [A,B] +' [C,D] +' ... +' [Y,Z] [1]
-
- [A,B] +' [C,D] +' ... +' [Z,0] [2]
-
- where +' indicates 1's complement addition. These cases
- correspond to an even or odd count of bytes, respectively.
-
- On a 2's complement machine, the 1's complement sum must be
- computed by means of an "end around carry", i.e., any overflows
- from the most significant bits are added into the least
- significant bits. See the examples below.
-
- Section 2 explores the properties of this checksum that may be
- exploited to speed its calculation. Section 3 contains some
- numerical examples of the most important implementation
- techniques. Finally, Section 4 includes examples of specific
- algorithms for a variety of common CPU types. We are grateful
- to Van Jacobson and Charley Kline for their contribution of
- algorithms to this section.
-
- The properties of the Internet checksum were originally
- discussed by Bill Plummer in IEN-45, entitled "Checksum Function
- Design". Since IEN-45 has not been widely available, we include
- it as an extended appendix to this RFC.
-
- 2. Calculating the Checksum
-
- This simple checksum has a number of wonderful mathematical
- properties that may be exploited to speed its calculation, as we
- will now discuss.
-
-
- (A) Commutative and Associative
-
- As long as the even/odd assignment of bytes is respected, the
- sum can be done in any order, and it can be arbitrarily split
- into groups.
-
- For example, the sum [1] could be split into:
-
- ( [A,B] +' [C,D] +' ... +' [J,0] )
-
- +' ( [0,K] +' ... +' [Y,Z] ) [3]
-
-
-
-
-
-
-
- Braden, Borman, & Partridge [Page 2]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- (B) Byte Order Independence
-
- The sum of 16-bit integers can be computed in either byte order.
- Thus, if we calculate the swapped sum:
-
- [B,A] +' [D,C] +' ... +' [Z,Y] [4]
-
- the result is the same as [1], except the bytes are swapped in
- the sum! To see why this is so, observe that in both orders the
- carries are the same: from bit 15 to bit 0 and from bit 7 to bit
- 8. In other words, consistently swapping bytes simply rotates
- the bits within the sum, but does not affect their internal
- ordering.
-
- Therefore, the sum may be calculated in exactly the same way
- regardless of the byte order ("big-endian" or "little-endian")
- of the underlaying hardware. For example, assume a "little-
- endian" machine summing data that is stored in memory in network
- ("big-endian") order. Fetching each 16-bit word will swap
- bytes, resulting in the sum [4]; however, storing the result
- back into memory will swap the sum back into network byte order.
-
- Byte swapping may also be used explicitly to handle boundary
- alignment problems. For example, the second group in [3] can be
- calculated without concern to its odd/even origin, as:
-
- [K,L] +' ... +' [Z,0]
-
- if this sum is byte-swapped before it is added to the first
- group. See the example below.
-
- (C) Parallel Summation
-
- On machines that have word-sizes that are multiples of 16 bits,
- it is possible to develop even more efficient implementations.
- Because addition is associative, we do not have to sum the
- integers in the order they appear in the message. Instead we
- can add them in "parallel" by exploiting the larger word size.
-
- To compute the checksum in parallel, simply do a 1's complement
- addition of the message using the native word size of the
- machine. For example, on a 32-bit machine we can add 4 bytes at
- a time: [A,B,C,D]+'... When the sum has been computed, we "fold"
- the long sum into 16 bits by adding the 16-bit segments. Each
- 16-bit addition may produce new end-around carries that must be
- added.
-
- Furthermore, again the byte order does not matter; we could
- instead sum 32-bit words: [D,C,B,A]+'... or [B,A,D,C]+'... and
- then swap the bytes of the final 16-bit sum as necessary. See
- the examples below. Any permutation is allowed that collects
-
-
-
- Braden, Borman, & Partridge [Page 3]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- all the even-numbered data bytes into one sum byte and the odd-
- numbered data bytes into the other sum byte.
-
-
- There are further coding techniques that can be exploited to speed up
- the checksum calculation.
-
- (1) Deferred Carries
-
- Depending upon the machine, it may be more efficient to defer
- adding end-around carries until the main summation loop is
- finished.
-
- One approach is to sum 16-bit words in a 32-bit accumulator, so
- the overflows build up in the high-order 16 bits. This approach
- typically avoids a carry-sensing instruction but requires twice
- as many additions as would adding 32-bit segments; which is
- faster depends upon the detailed hardware architecture.
-
- (2) Unwinding Loops
-
- To reduce the loop overhead, it is often useful to "unwind" the
- inner sum loop, replicating a series of addition commands within
- one loop traversal. This technique often provides significant
- savings, although it may complicate the logic of the program
- considerably.
-
- (3) Combine with Data Copying
-
- Like checksumming, copying data from one memory location to
- another involves per-byte overhead. In both cases, the
- bottleneck is essentially the memory bus, i.e., how fast the
- data can be fetched. On some machines (especially relatively
- slow and simple micro-computers), overhead can be significantly
- reduced by combining memory-to-memory copy and the checksumming,
- fetching the data only once for both.
-
- (4) Incremental Update
-
- Finally, one can sometimes avoid recomputing the entire checksum
- when one header field is updated. The best-known example is a
- gateway changing the TTL field in the IP header, but there are
- other examples (for example, when updating a source route). In
- these cases it is possible to update the checksum without
- scanning the message or datagram.
-
- To update the checksum, simply add the differences of the
- sixteen bit integers that have been changed. To see why this
- works, observe that every 16-bit integer has an additive inverse
- and that addition is associative. From this it follows that
- given the original value m, the new value m', and the old
-
-
-
- Braden, Borman, & Partridge [Page 4]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- checksum C, the new checksum C' is:
-
- C' = C + (-m) + m' = C + (m' - m)
-
-
- 3. Numerical Examples
-
- We now present explicit examples of calculating a simple 1's
- complement sum on a 2's complement machine. The examples show the
- same sum calculated byte by bye, by 16-bits words in normal and
- swapped order, and 32 bits at a time in 3 different orders. All
- numbers are in hex.
-
- Byte-by-byte "Normal" Swapped
- Order Order
-
- Byte 0/1: 00 01 0001 0100
- Byte 2/3: f2 03 f203 03f2
- Byte 4/5: f4 f5 f4f5 f5f4
- Byte 6/7: f6 f7 f6f7 f7f6
- --- --- ----- -----
- Sum1: 2dc 1f0 2ddf0 1f2dc
-
- dc f0 ddf0 f2dc
- Carrys: 1 2 2 1
- -- -- ---- ----
- Sum2: dd f2 ddf2 f2dd
-
- Final Swap: dd f2 ddf2 ddf2
-
-
- Byte 0/1/2/3: 0001f203 010003f2 03f20100
- Byte 4/5/6/7: f4f5f6f7 f5f4f7f6 f7f6f5f4
- -------- -------- --------
- Sum1: 0f4f7e8fa 0f6f4fbe8 0fbe8f6f4
-
- Carries: 0 0 0
-
- Top half: f4f7 f6f4 fbe8
- Bottom half: e8fa fbe8 f6f4
- ----- ----- -----
- Sum2: 1ddf1 1f2dc 1f2dc
-
- ddf1 f2dc f2dc
- Carrys: 1 1 1
- ---- ---- ----
- Sum3: ddf2 f2dd f2dd
-
- Final Swap: ddf2 ddf2 ddf2
-
-
-
-
-
- Braden, Borman, & Partridge [Page 5]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Finally, here an example of breaking the sum into two groups, with
- the second group starting on a odd boundary:
-
-
- Byte-by-byte Normal
- Order
-
- Byte 0/1: 00 01 0001
- Byte 2/ : f2 (00) f200
- --- --- -----
- Sum1: f2 01 f201
-
- Byte 4/5: 03 f4 03f4
- Byte 6/7: f5 f6 f5f6
- Byte 8/: f7 (00) f700
- --- --- -----
- Sum2: 1f0ea
-
- Sum2: f0ea
- Carry: 1
- -----
- Sum3: f0eb
-
- Sum1: f201
- Sum3 byte swapped: ebf0
- -----
- Sum4: 1ddf1
-
- Sum4: ddf1
- Carry: 1
- -----
- Sum5: ddf2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden, Borman, & Partridge [Page 6]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- 4. Implementation Examples
-
- In this section we show examples of Internet checksum implementation
- algorithms that have been found to be efficient on a variety of
- CPU's. In each case, we show the core of the algorithm, without
- including environmental code (e.g., subroutine linkages) or special-
- case code.
-
- 4.1 "C"
-
- The following "C" code algorithm computes the checksum with an inner
- loop that sums 16-bits at a time in a 32-bit accumulator.
-
- in 6
- {
- /* Compute Internet Checksum for "count" bytes
- * beginning at location "addr".
- */
- register long sum = 0;
-
- while( count > 1 ) {
- /* This is the inner loop */
- sum += * (unsigned short) addr++;
- count -= 2;
- }
-
- /* Add left-over byte, if any */
- if( count > 0 )
- sum += * (unsigned char *) addr;
-
- /* Fold 32-bit sum to 16 bits */
- while (sum>>16)
- sum = (sum & 0xffff) + (sum >> 16);
-
- checksum = ~sum;
- }
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden, Borman, & Partridge [Page 7]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- 4.2 Motorola 68020
-
- The following algorithm is given in assembler language for a Motorola
- 68020 chip. This algorithm performs the sum 32 bits at a time, and
- unrolls the loop with 16 replications. For clarity, we have omitted
- the logic to add the last fullword when the length is not a multiple
- of 4. The result is left in register d0.
-
- With a 20MHz clock, this routine was measured at 134 usec/kB summing
- random data. This algorithm was developed by Van Jacobson.
-
-
- movl d1,d2
- lsrl #6,d1 | count/64 = # loop traversals
- andl #0x3c,d2 | Then find fractions of a chunk
- negl d2
- andb #0xf,cc | Clear X (extended carry flag)
-
- jmp pc@(2$-.-2:b,d2) | Jump into loop
-
- 1$: | Begin inner loop...
-
- movl a0@+,d2 | Fetch 32-bit word
- addxl d2,d0 | Add word + previous carry
- movl a0@+,d2 | Fetch 32-bit word
- addxl d2,d0 | Add word + previous carry
-
- | ... 14 more replications
- 2$:
- dbra d1,1$ | (NB- dbra doesn't affect X)
-
- movl d0,d1 | Fold 32 bit sum to 16 bits
- swap d1 | (NB- swap doesn't affect X)
- addxw d1,d0
- jcc 3$
- addw #1,d0
- 3$:
- andl #0xffff,d0
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden, Borman, & Partridge [Page 8]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- 4.3 Cray
-
- The following example, in assembler language for a Cray CPU, was
- contributed by Charley Kline. It implements the checksum calculation
- as a vector operation, summing up to 512 bytes at a time with a basic
- summation unit of 32 bits. This example omits many details having to
- do with short blocks, for clarity.
-
- Register A1 holds the address of a 512-byte block of memory to
- checksum. First two copies of the data are loaded into two vector
- registers. One is vector-shifted right 32 bits, while the other is
- vector-ANDed with a 32 bit mask. Then the two vectors are added
- together. Since all these operations chain, it produces one result
- per clock cycle. Then it collapses the result vector in a loop that
- adds each element to a scalar register. Finally, the end-around
- carry is performed and the result is folded to 16-bits.
-
- EBM
- A0 A1
- VL 64 use full vectors
- S1 <32 form 32-bit mask from the right.
- A2 32
- V1 ,A0,1 load packet into V1
- V2 S1&V1 Form right-hand 32-bits in V2.
- V3 V1>A2 Form left-hand 32-bits in V3.
- V1 V2+V3 Add the two together.
- A2 63 Prepare to collapse into a scalar.
- S1 0
- S4 <16 Form 16-bit mask from the right.
- A4 16
- CK$LOOP S2 V1,A2
- A2 A2-1
- A0 A2
- S1 S1+S2
- JAN CK$LOOP
- S2 S1&S4 Form right-hand 16-bits in S2
- S1 S1>A4 Form left-hand 16-bits in S1
- S1 S1+S2
- S2 S1&S4 Form right-hand 16-bits in S2
- S1 S1>A4 Form left-hand 16-bits in S1
- S1 S1+S2
- S1 #S1 Take one's complement
- CMR At this point, S1 contains the checksum.
-
-
-
-
-
-
-
-
-
-
-
- Braden, Borman, & Partridge [Page 9]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- 4.4 IBM 370
-
- The following example, in assembler language for an IBM 370 CPU, sums
- the data 4 bytes at a time. For clarity, we have omitted the logic
- to add the last fullword when the length is not a multiple of 4, and
- to reverse the bytes when necessary. The result is left in register
- RCARRY.
-
- This code has been timed on an IBM 3090 CPU at 27 usec/KB when
- summing all one bits. This time is reduced to 24.3 usec/KB if the
- trouble is taken to word-align the addends (requiring special cases
- at both the beginning and the end, and byte-swapping when necessary
- to compensate for starting on an odd byte).
-
- * Registers RADDR and RCOUNT contain the address and length of
- * the block to be checksummed.
- *
- * (RCARRY, RSUM) must be an even/odd register pair.
- * (RCOUNT, RMOD) must be an even/odd register pair.
- *
- CHECKSUM SR RSUM,RSUM Clear working registers.
- SR RCARRY,RCARRY
- LA RONE,1 Set up constant 1.
- *
- SRDA RCOUNT,6 Count/64 to RCOUNT.
- AR RCOUNT,RONE +1 = # times in loop.
- SRL RMOD,26 Size of partial chunk to RMOD.
- AR RADDR,R3 Adjust addr to compensate for
- S RADDR,=F(64) jumping into the loop.
- SRL RMOD,1 (RMOD/4)*2 is halfword index.
- LH RMOD,DOPEVEC9(RMOD) Use magic dope-vector for offset,
- B LOOP(RMOD) and jump into the loop...
- *
- * Inner loop:
- *
- LOOP AL RSUM,0(,RADDR) Add Logical fullword
- BC 12,*+6 Branch if no carry
- AR RCARRY,RONE Add 1 end-around
- AL RSUM,4(,RADDR) Add Logical fullword
- BC 12,*+6 Branch if no carry
- AR RCARRY,RONE Add 1 end-around
- *
- * ... 14 more replications ...
- *
- A RADDR,=F'64' Increment address ptr
- BCT RCOUNT,LOOP Branch on Count
- *
- * Add Carries into sum, and fold to 16 bits
- *
- ALR RCARRY,RSUM Add SUM and CARRY words
- BC 12,*+6 and take care of carry
-
-
-
- Braden, Borman, & Partridge [Page 10]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- AR RCARRY,RONE
- SRDL RCARRY,16 Fold 32-bit sum into
- SRL RSUM,16 16-bits
- ALR RCARRY,RSUM
- C RCARRY,=X'0000FFFF' and take care of any
- BNH DONE last carry
- S RCARRY,=X'0000FFFF'
- DONE X RCARRY,=X'0000FFFF' 1's complement
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden, Borman, & Partridge [Page 11]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- IEN 45
- Section 2.4.4.5
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- TCP Checksum Function Design
-
-
-
- William W. Plummer
-
-
- Bolt Beranek and Newman, Inc.
- 50 Moulton Street
- Cambridge MA 02138
-
-
- 5 June 1978
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Braden, Borman, & Partridge [Page 12]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- 1. Introduction
-
- Checksums are included in packets in order that errors
- encountered during transmission may be detected. For Internet
- protocols such as TCP [1,9] this is especially important because
- packets may have to cross wireless networks such as the Packet
- Radio Network [2] and Atlantic Satellite Network [3] where
- packets may be corrupted. Internet protocols (e.g., those for
- real time speech transmission) can tolerate a certain level of
- transmission errors and forward error correction techniques or
- possibly no checksum at all might be better. The focus in this
- paper is on checksum functions for protocols such as TCP where
- the required reliable delivery is achieved by retransmission.
-
- Even if the checksum appears good on a message which has been
- received, the message may still contain an undetected error. The
- probability of this is bounded by 2**(-C) where C is the number
- of checksum bits. Errors can arise from hardware (and software)
- malfunctions as well as transmission errors. Hardware induced
- errors are usually manifested in certain well known ways and it
- is desirable to account for this in the design of the checksum
- function. Ideally no error of the "common hardware failure" type
- would go undetected.
-
- An example of a failure that the current checksum function
- handles successfully is picking up a bit in the network interface
- (or I/O buss, memory channel, etc.). This will always render the
- checksum bad. For an example of how the current function is
- inadequate, assume that a control signal stops functioning in the
- network interface and the interface stores zeros in place of the
- real data. These "all zero" messages appear to have valid
- checksums. Noise on the "There's Your Bit" line of the ARPANET
- Interface [4] may go undetected because the extra bits input may
- cause the checksum to be perturbed (i.e., shifted) in the same
- way as the data was.
-
- Although messages containing undetected errors will occasionally
- be passed to higher levels of protocol, it is likely that they
- will not make sense at that level. In the case of TCP most such
- messages will be ignored, but some could cause a connection to be
- aborted. Garbled data could be viewed as a problem for a layer
- of protocol above TCP which itself may have a checksuming scheme.
-
- This paper is the first step in design of a new checksum function
- for TCP and some other Internet protocols. Several useful
- properties of the current function are identified. If possible
-
- - 1 -
-
-
-
- Braden, Borman, & Partridge [Page 13]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- these should be retained in any new function. A number of
- plausible checksum schemes are investigated. Of these only the
- "product code" seems to be simple enough for consideration.
-
- 2. The Current TCP Checksum Function
-
- The current function is oriented towards sixteen-bit machines
- such as the PDP-11 but can be computed easily on other machines
- (e.g., PDP-10). A packet is thought of as a string of 16-bit
- bytes and the checksum function is the one's complement sum (add
- with end-around carry) of those bytes. It is the one's
- complement of this sum which is stored in the checksum field of
- the TCP header. Before computing the checksum value, the sender
- places a zero in the checksum field of the packet. If the
- checksum value computed by a receiver of the packet is zero, the
- packet is assumed to be valid. This is a consequence of the
- "negative" number in the checksum field exactly cancelling the
- contribution of the rest of the packet.
-
- Ignoring the difficulty of actually evaluating the checksum
- function for a given packet, the way of using the checksum
- described above is quite simple, but it assumes some properties
- of the checksum operator (one's complement addition, "+" in what
- follows):
-
- (P1) + is commutative. Thus, the order in which
- the 16-bit bytes are "added" together is
- unimportant.
-
- (P2) + has at least one identity element (The
- current function has two: +0 and -0). This
- allows the sender to compute the checksum
- function by placing a zero in the packet checksum
- field before computing the value.
-
- (P3) + has an inverse. Thus, the receiver may
- evaluate the checksum function and expect a zero.
-
- (P4) + is associative, allowing the checksum field
- to be anywhere in the packet and the 16-bit bytes
- to be scanned sequentially.
-
- Mathematically, these properties of the binary operation "+" over
- the set of 16-bit numbers forms an Abelian group [5]. Of course,
- there are many Abelian groups but not all would be satisfactory
- for use as checksum operators. (Another operator readily
-
- - 2 -
-
-
-
- Braden, Borman, & Partridge [Page 14]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- available in the PDP-11 instruction set that has all of these
- properties is exclusive-OR, but XOR is unsatisfactory for other
- reasons.)
-
- Albeit imprecise, another property which must be preserved in any
- future checksum scheme is:
-
- (P5) + is fast to compute on a variety of machines
- with limited storage requirements.
-
- The current function is quite good in this respect. On the
- PDP-11 the inner loop looks like:
-
- LOOP: ADD (R1)+,R0 ; Add the next 16-bit byte
- ADC R0 ; Make carry be end-around
- SOB R2,LOOP ; Loop over entire packet.
-
- ( 4 memory cycles per 16-bit byte )
-
- On the PDP-10 properties P1-4 are exploited further and two
- 16-bit bytes per loop are processed:
-
- LOOP: ILDB THIS,PTR ; Get 2 16-bit bytes
- ADD SUM,THIS ; Add into current sum
- JUMPGE SUM,CHKSU2 ; Jump if fewer than 8 carries
- LDB THIS,[POINT 20,SUM,19] ; Get left 16 and carries
- ANDI SUM,177777 ; Save just low 16 here
- ADD SUM,THIS ; Fold in carries
- CHKSU2: SOJG COUNT,LOOP ; Loop over entire packet
-
- ( 3.1 memory cycles per 16-bit byte )
-
- The "extra" instruction in the loops above are required to
- convert the two's complement ADD instruction(s) into a one's
- complement add by making the carries be end-around. One's
- complement arithmetic is better than two's complement because it
- is equally sensitive to errors in all bit positions. If two's
- complement addition were used, an even number of 1's could be
- dropped (or picked up) in the most significant bit channel
- without affecting the value of the checksum. It is just this
- property that makes some sort of addition preferable to a simple
- exclusive-OR which is frequently used but permits an even number
- of drops (pick ups) in any bit channel. RIM10B paper tape format
- used on PDP-10s [10] uses two's complement add because space for
- the loader program is extremely limited.
-
- - 3 -
-
-
-
-
- Braden, Borman, & Partridge [Page 15]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- Another property of the current checksum scheme is:
-
- (P6) Adding the checksum to a packet does not change
- the information bytes. Peterson [6] calls this a
- "systematic" code.
-
- This property allows intermediate computers such as gateway
- machines to act on fields (i.e., the Internet Destination
- Address) without having to first decode the packet. Cyclical
- Redundancy Checks used for error correction are not systematic
- either. However, most applications of CRCs tend to emphasize
- error detection rather than correction and consequently can send
- the message unchanged, with the CRC check bits being appended to
- the end. The 24-bit CRC used by ARPANET IMPs and Very Distant
- Host Interfaces [4] and the ANSI standards for 800 and 6250 bits
- per inch magnetic tapes (described in [11]) use this mode.
-
- Note that the operation of higher level protocols are not (by
- design) affected by anything that may be done by a gateway acting
- on possibly invalid packets. It is permissible for gateways to
- validate the checksum on incoming packets, but in general
- gateways will not know how to do this if the checksum is a
- protocol-specific feature.
-
- A final property of the current checksum scheme which is actually
- a consequence of P1 and P4 is:
-
- (P7) The checksum may be incrementally modified.
-
- This property permits an intermediate gateway to add information
- to a packet, for instance a timestamp, and "add" an appropriate
- change to the checksum field of the packet. Note that the
- checksum will still be end-to-end since it was not fully
- recomputed.
-
- 3. Product Codes
-
- Certain "product codes" are potentially useful for checksuming
- purposes. The following is a brief description of product codes
- in the context of TCP. More general treatment can be found in
- Avizienis [7] and probably other more recent works.
-
- The basic concept of this coding is that the message (packet) to
- be sent is formed by transforming the original source message and
- adding some "check" bits. By reading this and applying a
- (possibly different) transformation, a receiver can reconstruct
-
- - 4 -
-
-
-
- Braden, Borman, & Partridge [Page 16]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- the original message and determine if it has been corrupted
- during transmission.
-
- Mo Ms Mr
-
- ----- ----- -----
- | A | code | 7 | decode | A |
- | B | ==> | 1 | ==> | B |
- | C | | 4 | | C |
- ----- |...| -----
- | 2 | check plus "valid" flag
- ----- info
-
- Original Sent Reconstructed
-
- With product codes the transformation is Ms = K * Mo . That is,
- the message sent is simply the product of the original message
- Mo and some well known constant K . To decode, the received
- Ms is divided by K which will yield Mr as the quotient and
- 0 as the remainder if Mr is to be considered the same as Mo .
-
- The first problem is selecting a "good" value for K, the "check
- factor". K must be relatively prime to the base chosen to
- express the message. (Example: Binary messages with K
- incorrectly chosen to be 8. This means that Ms looks exactly
- like Mo except that three zeros have been appended. The only
- way the message could look bad to a receiver dividing by 8 is if
- the error occurred in one of those three bits.)
-
- For TCP the base R will be chosen to be 2**16. That is, every
- 16-bit byte (word on the PDP-11) will be considered as a digit of
- a big number and that number is the message. Thus,
-
- Mo = SIGMA [ Bi * (R**i)] , Bi is i-th byte
- i=0 to N
-
- Ms = K * Mo
-
- Corrupting a single digit of Ms will yield Ms' = Ms +or-
- C*(R**j) for some radix position j . The receiver will compute
- Ms'/K = Mo +or- C(R**j)/K. Since R and K are relatively prime,
- C*(R**j) cannot be any exact multiple of K. Therefore, the
- division will result in a non-zero remainder which indicates that
- Ms' is a corrupted version of Ms. As will be seen, a good
- choice for K is (R**b - 1), for some b which is the "check
- length" which controls the degree of detection to be had for
-
- - 5 -
-
-
-
- Braden, Borman, & Partridge [Page 17]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- burst errors which affect a string of digits (i.e., 16-bit bytes)
- in the message. In fact b will be chosen to be 1, so K will
- be 2**16 - 1 so that arithmetic operations will be simple. This
- means that all bursts of 15 or fewer bits will be detected.
- According to [7] this choice for b results in the following
- expression for the fraction of undetected weight 2 errors:
-
- f = 16(k-1)/[32(16k-3) + (6/k)] where k is the message length.
-
- For large messages f approaches 3.125 per cent as k goes to
- infinity.
-
- Multiple precision multiplication and division are normally quite
- complex operations, especially on small machines which typically
- lack even single precision multiply and divide operations. The
- exception to this is exactly the case being dealt with here --
- the factor is 2**16 - 1 on machines with a word length of 16
- bits. The reason for this is due to the following identity:
-
- Q*(R**j) = Q, mod (R-1) 0 <= Q < R
-
- That is, any digit Q in the selected radix (0, 1, ... R-1)
- multiplied by any power of the radix will have a remainder of Q
- when divided by the radix minus 1.
-
- Example: In decimal R = 10. Pick Q = 6.
-
- 6 = 0 * 9 + 6 = 6, mod 9
- 60 = 6 * 9 + 6 = 6, mod 9
- 600 = 66 * 9 + 6 = 6, mod 9 etc.
-
- More to the point, rem(31415/9) = rem((30000+1000+400+10+5)/9)
- = (3 mod 9) + (1 mod 9) + (4 mod 9) + (1 mod 9) + (5 mod 9)
- = (3+1+4+1+5) mod 9
- = 14 mod 9
- = 5
-
- So, the remainder of a number divided by the radix minus one can
- be found by simply summing the digits of the number. Since the
- radix in the TCP case has been chosen to be 2**16 and the check
- factor is 2**16 - 1, a message can quickly be checked by summing
- all of the 16-bit words (on a PDP-11), with carries being
- end-around. If zero is the result, the message can be considered
- valid. Thus, checking a product coded message is exactly the
- same complexity as with the current TCP checksum!
-
- - 6 -
-
-
-
-
- Braden, Borman, & Partridge [Page 18]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- In order to form Ms, the sender must multiply the multiple
- precision "number" Mo by 2**16 - 1. Or, Ms = (2**16)Mo - Mo.
- This is performed by shifting Mo one whole word's worth of
- precision and subtracting Mo. Since carries must propagate
- between digits, but it is only the current digit which is of
- interest, one's complement arithmetic is used.
-
- (2**16)Mo = Mo0 + Mo1 + Mo2 + ... + MoX + 0
- - Mo = - ( Mo0 + Mo1 + ......... + MoX)
- --------- ----------------------------------
- Ms = Ms0 + Ms1 + ... - MoX
-
- A loop which implements this function on a PDP-11 might look
- like:
- LOOP: MOV -2(R2),R0 ; Next byte of (2**16)Mo
- SBC R0 ; Propagate carries from last SUB
- SUB (R2)+,R0 ; Subtract byte of Mo
- MOV R0,(R3)+ ; Store in Ms
- SOB R1,LOOP ; Loop over entire message
- ; 8 memory cycles per 16-bit byte
-
- Note that the coding procedure is not done in-place since it is
- not systematic. In general the original copy, Mo, will have to
- be retained by the sender for retransmission purposes and
- therefore must remain readable. Thus the MOV R0,(R3)+ is
- required which accounts for 2 of the 8 memory cycles per loop.
-
- The coding procedure will add exactly one 16-bit word to the
- message since Ms < (2**16)Mo . This additional 16 bits will be
- at the tail of the message, but may be moved into the defined
- location in the TCP header immediately before transmission. The
- receiver will have to undo this to put Ms back into standard
- format before decoding the message.
-
- The code in the receiver for fully decoding the message may be
- inferred by observing that any word in Ms contains the
- difference between two successive words of Mo minus the carries
- from the previous word, and the low order word contains minus the
- low word of Mo. So the low order (i.e., rightmost) word of Mr is
- just the negative of the low order byte of Ms. The next word of
- Mr is the next word of Ms plus the just computed word of Mr
- plus the carry from that previous computation.
-
- A slight refinement of the procedure is required in order to
- protect against an all-zero message passing to the destination.
- This will appear to have a valid checksum because Ms'/K = 0/K
-
- - 7 -
-
-
-
- Braden, Borman, & Partridge [Page 19]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- = 0 with 0 remainder. The refinement is to make the coding be
- Ms = K*Mo + C where C is some arbitrary, well-known constant.
- Adding this constant requires a second pass over the message, but
- this will typically be very short since it can stop as soon as
- carries stop propagating. Chosing C = 1 is sufficient in most
- cases.
-
- The product code checksum must be evaluated in terms of the
- desired properties P1 - P7. It has been shown that a factor of
- two more machine cycles are consumed in computing or verifying a
- product code checksum (P5 satisfied?).
-
- Although the code is not systematic, the checksum can be verified
- quickly without decoding the message. If the Internet
- Destination Address is located at the least significant end of
- the packet (where the product code computation begins) then it is
- possible for a gateway to decode only enough of the message to
- see this field without having to decode the entire message.
- Thus, P6 is at least partially satisfied. The algebraic
- properties P1 through P4 are not satisfied, but only a small
- amount of computation is needed to account for this -- the
- message needs to be reformatted as previously mentioned.
-
- P7 is satisfied since the product code checksum can be
- incrementally updated to account for an added word, although the
- procedure is somewhat involved. Imagine that the original
- message has two halves, H1 and H2. Thus, Mo = H1*(R**j) + H2.
- The timestamp word is to be inserted between these halves to form
- a modified Mo' = H1*(R**(j+1)) + T*(R**j) + H2. Since K has
- been chosen to be R-1, the transmitted message Ms' = Mo'(R-1).
- Then,
-
- Ms' = Ms*R + T(R-1)(R**j) + P2((R-1)**2)
-
- = Ms*R + T*(R**(j+1)) + T*(R**j) + P2*(R**2) - 2*P2*R - P2
-
- Recalling that R is 2**16, the word size on the PDP-11,
- multiplying by R means copying down one word in memory. So,
- the first term of Ms' is simply the unmodified message copied
- down one word. The next term is the new data T added into the
- Ms' being formed beginning at the (j+1)th word. The addition is
- fairly easy here since after adding in T all that is left is
- propagating the carry, and that can stop as soon as no carry is
- produced. The other terms can be handle similarly.
-
- - 8 -
-
-
-
-
-
- Braden, Borman, & Partridge [Page 20]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- 4. More Complicated Codes
-
- There exists a wealth of theory on error detecting and correcting
- codes. Peterson [6] is an excellent reference. Most of these
- "CRC" schemes are designed to be implemented using a shift
- register with a feedback network composed of exclusive-ORs.
- Simulating such a logic circuit with a program would be too slow
- to be useful unless some programming trick is discovered.
-
- One such trick has been proposed by Kirstein [8]. Basically, a
- few bits (four or eight) of the current shift register state are
- combined with bits from the input stream (from Mo) and the result
- is used as an index to a table which yields the new shift
- register state and, if the code is not systematic, bits for the
- output stream (Ms). A trial coding of an especially "good" CRC
- function using four-bit bytes showed showed this technique to be
- about four times as slow as the current checksum function. This
- was true for both the PDP-10 and PDP-11 machines. Of the
- desirable properties listed above, CRC schemes satisfy only P3
- (It has an inverse.), and P6 (It is systematic.). Placement of
- the checksum field in the packet is critical and the CRC cannot
- be incrementally modified.
-
- Although the bulk of coding theory deals with binary codes, most
- of the theory works if the alphabet contains q symbols, where
- q is a power of a prime number. For instance q taken as 2**16
- should make a great deal of the theory useful on a word-by-word
- basis.
-
- 5. Outboard Processing
-
- When a function such as computing an involved checksum requires
- extensive processing, one solution is to put that processing into
- an outboard processor. In this way "encode message" and "decode
- message" become single instructions which do not tax the main
- host processor. The Digital Equipment Corporation VAX/780
- computer is equipped with special hardware for generating and
- checking CRCs [13]. In general this is not a very good solution
- since such a processor must be constructed for every different
- host machine which uses TCP messages.
-
- It is conceivable that the gateway functions for a large host may
- be performed entirely in an "Internet Frontend Machine". This
- machine would be responsible for forwarding packets received
-
- - 9 -
-
-
-
-
-
- Braden, Borman, & Partridge [Page 21]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- either from the network(s) or from the Internet protocol modules
- in the connected host, and for reassembling Internet fragments
- into segments and passing these to the host. Another capability
- of this machine would be to check the checksum so that the
- segments given to the host are known to be valid at the time they
- leave the frontend. Since computer cycles are assumed to be both
- inexpensive and available in the frontend, this seems reasonable.
-
- The problem with attempting to validate checksums in the frontend
- is that it destroys the end-to-end character of the checksum. If
- anything, this is the most powerful feature of the TCP checksum!
- There is a way to make the host-to-frontend link be covered by
- the end-to-end checksum. A separate, small protocol must be
- developed to cover this link. After having validated an incoming
- packet from the network, the frontend would pass it to the host
- saying "here is an Internet segment for you. Call it #123". The
- host would save this segment, and send a copy back to the
- frontend saying, "Here is what you gave me as #123. Is it OK?".
- The frontend would then do a word-by-word comparison with the
- first transmission, and tell the host either "Here is #123
- again", or "You did indeed receive #123 properly. Release it to
- the appropriate module for further processing."
-
- The headers on the messages crossing the host-frontend link would
- most likely be covered by a fairly strong checksum so that
- information like which function is being performed and the
- message reference numbers are reliable. These headers would be
- quite short, maybe only sixteen bits, so the checksum could be
- quite strong. The bulk of the message would not be checksumed of
- course.
- The reason this scheme reduces the computing burden on the host
- is that all that is required in order to validate the message
- using the end-to-end checksum is to send it back to the frontend
- machine. In the case of the PDP-10, this requires only 0.5
- memory cycles per 16-bit byte of Internet message, and only a few
- processor cycles to setup the required transfers.
-
- 6. Conclusions
-
- There is an ordering of checksum functions: first and simplest is
- none at all which provides no error detection or correction.
- Second, is sending a constant which is checked by the receiver.
- This also is extremely weak. Third, the exclusive-OR of the data
- may be sent. XOR takes the minimal amount of computer time to
- generate and check, but is not a good checksum. A two's
- complement sum of the data is somewhat better and takes no more
-
- - 10 -
-
-
-
- Braden, Borman, & Partridge [Page 22]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- computer time to compute. Fifth, is the one's complement sum
- which is what is currently used by TCP. It is slightly more
- expensive in terms of computer time. The next step is a product
- code. The product code is strongly related to one's complement
- sum, takes still more computer time to use, provides a bit more
- protection against common hardware failures, but has some
- objectionable properties. Next is a genuine CRC polynomial code,
- used for checking purposes only. This is very expensive for a
- program to implement. Finally, a full CRC error correcting and
- detecting scheme may be used.
-
- For TCP and Internet applications the product code scheme is
- viable. It suffers mainly in that messages must be (at least
- partially) decoded by intermediate gateways in order that they
- can be forwarded. Should product codes not be chosen as an
- improved checksum, some slight modification to the existing
- scheme might be possible. For instance the "add and rotate"
- function used for paper tape by the PDP-6/10 group at the
- Artificial Intelligence Laboratory at M.I.T. Project MAC [12]
- could be useful if it can be proved that it is better than the
- current scheme and that it can be computed efficiently on a
- variety of machines.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- - 11 -
-
-
-
-
-
-
-
- Braden, Borman, & Partridge [Page 23]
-
- RFC 1071 Computing the Internet Checksum September 1988
-
-
- Internet Experiment Note 45 5 June 1978
- TCP Checksum Function Design William W. Plummer
-
- References
-
- [1] Cerf, V.G. and Kahn, Robert E., "A Protocol for Packet Network
- Communications," IEEE Transactions on Communications, vol.
- COM-22, No. 5, May 1974.
-
- [2] Kahn, Robert E., "The Organization of Computer Resources into
- a Packet Radio Network", IEEE Transactions on Communications,
- vol. COM-25, no. 1, pp. 169-178, January 1977.
-
- [3] Jacobs, Irwin, et al., "CPODA - A Demand Assignment Protocol
- for SatNet", Fifth Data Communications Symposium, September
- 27-9, 1977, Snowbird, Utah
-
- [4] Bolt Beranek and Newman, Inc. "Specifications for the
- Interconnection of a Host and an IMP", Report 1822, January
- 1976 edition.
-
- [5] Dean, Richard A., "Elements of Abstract Algebra", John Wyley
- and Sons, Inc., 1966
-
- [6] Peterson, W. Wesley, "Error Correcting Codes", M.I.T. Press
- Cambridge MA, 4th edition, 1968.
-
- [7] Avizienis, Algirdas, "A Study of the Effectiveness of Fault-
- Detecting Codes for Binary Arithmetic", Jet Propulsion
- Laboratory Technical Report No. 32-711, September 1, 1965.
-
- [8] Kirstein, Peter, private communication
-
- [9] Cerf, V. G. and Postel, Jonathan B., "Specification of
- Internetwork Transmission Control Program Version 3",
- University of Southern California Information Sciences
- Institute, January 1978.
-
- [10] Digital Equipment Corporation, "PDP-10 Reference Handbook",
- 1970, pp. 114-5.
-
- [11] Swanson, Robert, "Understanding Cyclic Redundancy Codes",
- Computer Design, November, 1975, pp. 93-99.
-
- [12] Clements, Robert C., private communication.
-
- [13] Conklin, Peter F., and Rodgers, David P., "Advanced
- Minicomputer Designed by Team Evaluation of Hardware/Software
- Tradeoffs", Computer Design, April 1978, pp. 136-7.
-
- - 12 -
-
-
-
- Braden, Borman, & Partridge [Page 24]
-
-